On the Window-disjoint-orthogonality of Speech Sources in Reverberant Humanoid Scenarios

نویسندگان

  • Sylvia Schulz
  • Thorsten Herfet
چکیده

Many speech source separation approaches are based on the assumption of orthogonality of speech sources in the time-frequency domain. The target speech source is demixed from the mixture by applying the ideal binary mask to the mixture. The time-frequency orthogonality of speech sources is investigated in detail only for anechoic and artificially mixed speech mixtures. This paper evaluates how the orthogonality of speech sources decreases when using a realistic reverberant humanoid recording setup and indicates strategies to enhance the separation capabilities of algorithms based on ideal binary masks under these conditions. It is shown that the SIR of the target source demixed from the mixture using the ideal binary mask decreases by approximately 3 dB for reverberation times of T60 = 0.6 s opposed to the anechoic scenario. For humanoid setups, the spatial distribution of the sources and the choice of the correct ear channel introduces differences in the SIR of further 3 dB, which leads to specific strategies to choose the best channel for demixing.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive time-frequency analysis for cognitive source separation

This thesis introduces a framework for separating two speech sources in non-ideal, reverberant environments. The source separation architecture tries to mimic the extraordinary abilities of the human auditory system when performing source separation. A movable human dummy head residing in a normal office room is used to model the conditions humans experience when listening to complex auditory s...

متن کامل

8 The DUET Blind Source Separation

This chapter presents a tutorial on the DUET Blind Source Separation method which can separate any number of sources using only two mixtures. The method is valid when sources are W-disjoint orthogonal, that is, when the supports of the windowed Fourier transform of the signals in the mixture are disjoint. For anechoic mixtures of attenuated and delayed sources, the method allows one to estimate...

متن کامل

A Stochastic Speech Model Supporting W-Disjoint Orthogonality

In previous work, we have successfully used an ideal joint sparseness assumption: W-Disjoint Orthogonality (WDO). This assumption, that the time-frequency representations of the sources have disjoint support, is satisfied in an approximate sense by many signals of practical interest, including speech. Here we discuss results derived from a stochastic model of speech signals that justify the WDO...

متن کامل

Blind Source Separation of Speech Mixtures using a Simple and Computationally Efficient Time-Frequency Approach

A very simple and extremely computationally efficient algorithm for blind separation of two speech sources from two mixtures is presented in this paper. The algorithm exploits the approximate W-disjoint orthogonality of speech signals and assumes specific sensors (microphones) setting that allows the sources to possess a feature we call cross high-low diversity. Two sources are said to be cross...

متن کامل

Blind Source Separation Based on Space-time-frequency Diversity

We investigate the assumption that sources have disjoint support in the time domain, time-frequency domain, or frequency domain. We call such signals disjoint orthogonal. The class of signals that approximately satisfies this assumption includes many synthetic signals, music and speech, as well as some biological signals. We measure the disjoint orthogonality of the benchmark signals in the ICA...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008